NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Transfer Learning with Uncertainty Quantification: Random Effect Calibration of Source to Target (RECaST)

Hickey, Jimmy; Williams, Jonathan P; Hector, Emily C (November 2024, Journal of machine learning research)

Transfer learning uses a data model, trained to make predictions or inferences on data from one population, to make reliable predictions or inferences on data from another population. Most existing transfer learning approaches are based on fine-tuning pre-trained neural network models, and fail to provide crucial uncertainty quantification. We develop a statistical framework for model predictions based on transfer learning, called RECaST. The primary mechanism is a Cauchy random effect that recalibrates a source model to a target population; we mathematically and empirically demonstrate the validity of our RECaST approach for transfer learning between linear models, in the sense that prediction sets will achieve their nominal stated coverage, and we numerically illustrate the method's robustness to asymptotic approximations for nonlinear models. Whereas many existing techniques are built on particular source models, RECaST is agnostic to the choice of source model, and does not require access to source data. For example, our RECaST transfer learning approach can be applied to a continuous or discrete data model with linear or logistic regression, deep neural network architectures, etc. Furthermore, RECaST provides uncertainty quantification for predictions, which is mostly absent in the literature. We examine our method's performance in a simulation study and in an application to real hospital data.
more » « less
Full Text Available
Turning the information-sharing dial: Efficient inference from different data sources

https://doi.org/10.1214/24-EJS2265

Hector, Emily C; Martin, Ryan (January 2024, Electronic Journal of Statistics)

A fundamental aspect of statistics is the integration of data from different sources. Classically, Fisher and others were focused on how to integrate homogeneous (or only mildly heterogeneous) sets of data. More recently, as data are becoming more accessible, the question of if data sets from different sources should be integrated is becoming more relevant. The current literature treats this as a question with only two answers: integrate or don’t. Here we take a different approach, motivated by information-sharing principles coming from the shrinkage estimation literature. In particular, we deviate from the do/don’t perspective and propose a dial parameter that controls the extent to which two data sources are integrated. How far this dial parameter should be turned is shown to depend, for example, on the informativeness of the different data sources as measured by Fisher information. In the context of generalized linear models, this more nuanced data integration framework leads to relatively simple parameter estimates and valid tests/confidence intervals. Moreover, we demonstrate both theoretically and empirically that setting the dial parameter according to our recommendation leads to more efficient estimation compared to other binary data integration schemes.
more » « less
Full Text Available
Distributed Inference for Spatial Extremes Modeling in High Dimensions

https://doi.org/10.1080/01621459.2023.2186886

Hector, Emily C.; Reich, Brian J. (April 2023, Journal of the American Statistical Association)

Full Text Available
A Distributed and Integrated Method of Moments for High-Dimensional Correlated Data Analysis

https://doi.org/10.1080/01621459.2020.1736082

Hector, Emily C.; Song, Peter X.-K. (April 2020, Journal of the American Statistical Association)

Full Text Available

Search for: All records